tarun005 / flavr Goto Github PK
View Code? Open in Web Editor NEWCode for FLAVR: A fast and efficient frame interpolation technique.
License: Apache License 2.0
Code for FLAVR: A fast and efficient frame interpolation technique.
License: Apache License 2.0
Input -
Output -
Hey @tarun005 , I used the 8x pretrained model on this video. The output seems blurry mostly at the edges. Can this be improved?
Hi, tarun ,excellent work on video interpolation!
I tried run your code , but I have some trouble. I set my config as
batch_size=2, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/home/Changchen/dataset./vimeo_septuplet', dataset='vimeo90K', exp_name='exp', joinType='concat', load_from=None, log_iter=60, loss='1*L1', lr=0.0002, max_epoch=50, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained=None, random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=1, upmode='transpose', use_tensorboard=False, val_freq=1
At the beginning, psnr was normal about 20,but it has gradually decreased to about 14. I wonder why it seems to be misconvergence.
Thank you for any help!
At first I tried to use Flowframes, but since it gave out an error I tried following your instructions on github. When I tried to run python interpolate.py --input_video input.mp4 --factor 8 --load_model FLAVR8X.pth
I got a very similar if not identical error message:
13.000209881905063 Traceback (most recent call last): File "interpolate.py", line 133, in <module> videoTensor , resizes = video_transform(videoTensor , args.downscale) File "interpolate.py", line 121, in video_transform videoTensor = transforms(videoTensor) File "C:\Users\frangamer1892roblox\MiniConda3\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__ img = t(img) File "D:\FLAVR\dataset\transforms.py", line 333, in __call__ return to_tensor(clip) File "D:\FLAVR\dataset\transforms.py", line 107, in to_tensor return clip.float().permute(3, 0, 1, 2) / 255.0 RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 25944883200 bytes.
I am not really use what to do now. This are my specs if they help in any way:
I find your model behave bad on low rate video (5-10fps), I wonder how to fix this? Maybe Train on low rate videos can help? Thanks a lot.
Hi there,
When Im trying to run the test.py with the pretrained model that you privided, I got an error is that it show there is NOT a directory : xxxxxx/sep_trainlist.txt (my input data path). Can you tell me how to fix this ? Thank you
Hi Tarun,
It is remarkable that you make inference speed for 16x or higher factor faster than Super SloMo while performing well. Will you publicate the trained model of 16x and higher and add support afterward?
Hi, thank you for releasing the code.
Your paper writes We use 8 GPUs and a mini-batch of 32 to train each model, and training is completed in about 36 hours for 2× a
on 2080ti.
But I found it will take at least 5 days on 8 v100 GPUs for 200-epoch training on Vimeo-90K.
Is there some problem ignored by me?
Hi,
I have a question about the evaluation on the 8x and 4x cases for Table.2 and Table. 3 regarding the Adobe dataset in the paper. It seems 4x cases has much higher PSNR compared to 8x cases.
Let's say the 7 intermediate frames are denoted as t1, t2, t3, t4, t5, t6, t7. To my understanding the PSNR values are normally:
(t1 close to t7) > (t2 close to t6) > (t3 close to t5) > t4
At lease, this is what I have observed for DAIN, SuperSloMo and QVI. And it is expected that when the temporal distance to the input frame increases, the interpolated quality decreases (lower PSNR).
For 4x, you would only have t2, t4, t6, so the average PSNR values should be expected to be lower than 8x.
However, for 4x in Table.3 FLAVR is 5.62dB higher compare to 8x in Table.2. And other methods (DAIN, QVI and SuperSloMo) all experienced much higher PSNR. To my understanding 5.62dB is a huge increase.
The expected trend should be similar to Table.3 in BMBC paper: https://arxiv.org/pdf/2007.12622.pdf
where PSNR(2x) < PSNR(4x) < PSNR(8x).
I am wondering if there is anything I missed for the evaluation that causes my confusion?
Thanks
Hi,
I use the AdaCoF official codes and the weight that you provide, only achieving 34.93dB on the Vimeo dataset, which is much lower than 35.40. Could you please share the script with me? Many thanks!
Hi,thanks your code!
I want to know this version code included Gating Module?
I'm running this on ubuntu w/a 3090 on a video file (4k mkv if it somehow matters, seems to read fine) but it fails silently at
Line 115 in d2c9ba9
Any thoughts on debugging?
your work is impressive!Hello, I'd like to ask you a few questions.When I download your code to training, I put the batchsize into 6, change data volume to around 20000, also use vimeo, but why I trained 70 several epoch, the loss on the training set and test set, PSNR and ssim are not convergence,lr at this time has dropped to a low value, so I think that also does not have the resolution to training necessary, the last PSNR is also less than 20, I wonder why I training result is so poor, and the index of the paper you far, can you give some advice?
Sorry to bother
Is there any code for predicting the optical flow based on your model?
I will appreciate it so much
Hi,
What is the batch size for training FLAVR for 8x VFI, I see in the paper that it is 32 with the frame size of 512x512. But I train on 8 GPUs(1080Ti whose memory is the same as 2080Ti), I got OOM error and only a batch size of 16 is ok. Besides that, neither random frame order reversal nor random horizontal flipping as augmentation strategies can be found in the GoPro.py. I wonder that can I reproduce the results with this code ?
I have 49 video frames, and if I check the length of videoTensor, it matches (49).
However, idxs ends up being only 46 long, resulting in the first and last frame not being interpolated.
Hi,
When I used custom video test interpolate.py I found that the video size had changed.The input size is(960,540) and the output size is (960,536).This is where I find the problem
downscale = int(downscale * 8) resizes = 8 *( H // downscale), 8 *( W // downscale)
So I changed the code like this
resizes = (8 * H // downscale), (8 * W // downscale)
.
But it has an error.
How can I fix this?
Apparently interpolation ignores the first and last file. Thus 5 input files means 3 are interpolated, yielding (3-1)*4+1 frames, i.e. 9. Should it not be (5-1)*4+1, i.e. 17 frames?
One can replicate the problem with a test image set here: https://github.com/lucaskuzma/FLAVR/blob/main/notebooks/Updated_Fast_Frame_Interpolation_with_FLAVR.ipynb
Hi, I was reading your work and was wondering, how do you obtain the highest attention weight for the feature maps in figure 5? Do you just sum up the tensor along the channel dimension and sort that or do you use some other method?
Thanks!
where is motion magnification model?
Hi,
I am impressed with your new video frame interpolation paper.
When I tested, I got 32.59dB in vimeo90K triplet test set.
According to your Middleburry.py in dataset directory, I fixed VimeoSepTuplet class to VimeoTriplet class like below.
What is the problem in my fixed code?
I am wondering if I could get custom triplet interpolation code which takes two input frames and yields an intermediate frame.
class VimeoTriplet(Dataset):
def __init__(self, data_root):
self.data_root = data_root
self.image_root = os.path.join(self.data_root, 'sequences')
test_fn = os.path.join(self.data_root, 'tri_testlist.txt')
with open(test_fn, 'r') as txt:
self.seq_list = [line.strip() for line in txt]
def __getitem__(self, index):
im1 = Image.open('%s/%s/im1.png'%(self.image_root,self.seq_list[index])).convert('RGB')
gt = Image.open('%s/%s/im2.png'%(self.image_root,self.seq_list[index])).convert('RGB')
im3 = Image.open('%s/%s/im3.png'%(self.image_root,self.seq_list[index])).convert('RGB')
im1, gt, im3 = map(to_tensor, (im1,gt,im3))
return [im1, im1, im3, im3], [gt]
def __len__(self):
return len(self.seq_list)
Due to latency constraint, will Flavr use just past frames for multi-frame predictions?
Thanks,
Hi, I found UCF101 original dataset with avi format and UCF101 triplet dataset with png format. But there is no 5-frames dataset availble. Can you provide the method to generate the UCF101 testing dataset for FLAVR.
Hi Tarun,
Your work is really interesting!
I was wondering if you could share the models trained in the ablation experiments.
I am curious to see how the different loss functions affect downstream tasks like actuon recognition. From the paper, you mentioned that L1 loss results in sharper images. But, does this also translate to better action recognition results?
Please do share any experiments/insights regarding this. I would love to hear your thoughts.
Hi, thanks for your code, I tested the pretained model of 8x on the test set of gopro and all adobe dataset, and got psnr 31.31 and 31.83 respectively. It is the same as the results reported in paper about gopro, but not adobe(32.20 in paper).
I want to know if the results in paper is not tested on the whole adobe dataset, and could you provide more information about the experiment of 8x interpolation on adobe?
Thanks!
Hey,
I've managed to get up and running with flavr, right up until the final stage. I'm using a directory with a png sequence in it, which successfully runs through the network. But when it comes to writing it out I simply get:
Writing to in_2xmp4.mp4
in_2xmp4: No such file or directory
Traceback (most recent call last):
File "interpolate.py", line 164, in <module>
os.remove(output_video)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'in_2xmp4'
I'm reading a sequence of pngs from a directory, using is_folder
which is great; is there a way to write out a sequence of pngs rather than a video?
Unlike other video interpolation implementations like RIFE (https://github.com/hzwer/Arxiv2020-RIFE), this code loads all frames from disk into RAM.
This makes it impossible to interpolate videos longer than a minute or two, unless you have insane amounts of RAM.
It would be great if it's possible to instead load frames on-the-fly using a buffer, like RIFE does, for example.
Hi, I've been trying to train this network on an A100 GPU. However, as torch 1.5.0 doesn't support this GPU I am forced to use torch 1.9.0. The training is broken for torch versions>1.5.0 but cannot find the reason why. I have looked at the differences between the torch versions, however, nothing is clear as to why this happens. Basically, the model stays stuck at around 20dB for the duration of training. I previously tested this code on a 1080Ti with torch 1.5.0 and that worked fine. But due to memory constraints and training time, the A100 would be the better option.
Do you have any idea why this occurs and any possible solutions?
Thanks
In https://github.com/tarun005/FLAVR/blob/main/dataset/Davis_test.py, do you use DAVIS's training or testing set? The paper says 2847 quintuples are generated in total, but I found the training set can generate 2849 quintuples, while the testing set can generate 963 quintuples.
Running interpolate.py on a 5-second 960x1080 video causes the RAM to fill up constantly at a rate of about 100 mb every 10 seconds.
Hi, I am new to pytorch and want to make sure. You use this line to ensure that the augmentations, e.g. randomcrop(), randomflip(), colorjitter(), applied to all frames are identical?
FLAVR/dataset/vimeo90k_septuplet.py
Line 66 in 8896a0f
Hi Tarun,
from the #32 issue I know we can cascade different models to make more speed interpolation,such cascade(2x,8x) models to make 16x interpolation, but how to do the cascade? Is that I use 2x model to generate 2x slow sequences firstly,and then apply 8x model to the 2x slow sequences?
To my best knowledge, other methods train their networks using vimeo_triplet which contains three consecutive frames instead of using vimeo_setuplet (used in video super-resolution). Why do you provide the "vimeo_setuplet" dataset for testing? Is it a fair comparison with the existing approaches?
Can i train with o 1080Ti GPU (12GB)? Thank you
Hello!, Since I have not trained and tested the network, I would like to know what the parameters of this model are? Thanks
When I try to interpolate a video, this error pops up:
File "interpolate.py", line 120, in <module>
videoTensor = video_to_tensor(input_video)
File "interpolate.py", line 101, in video_to_tensor
fps = md["video_fps"]
KeyError: 'video_fps'
I suspect it fails to read the frame rate for some reason.
This is one of the reasons I am asking for a manual input: #4
Thanks for your greate jobs!
Have you compared the performance and effect with RIFE?
Hello, is it possible to run interpolate.py
but with PNG frames as input instead of a video?
It would be very helpful for my use cases as I do preprocessing with ffmpeg.
Hi, I download GOPRO_Large_all.zip from https://seungjunnah.github.io/Datasets/gopro.html. I find that there are 33 folders. But I did not see any hint on which folders are of training set or test set. Is this link right?
Hi,
First of all thanks for this amazing work you did and for sharing it with the ML/AI community.
A Colab notebook to do 2X slow-motion filtering is now available in my GitHub space: https://github.com/virtualramblas/python-notebooks-repo/tree/main/Colab/FLAVR
Please let me know if you are interested in it, so I could start a pull request to merge it to this repository.
Thanks.
Best Regards,
Guglielmo
Hi, author, thank you for sharing the code on GitHub. The code performed well in test, but the PSNR value was always maintained at about 17dB during training. What is the reason?
Hi. Thx for your efforts on benchmarking existing models. I wonder which repo you are using for quadratic video inpainting (QVI) model? Could you please share the link?
Hi there,
I was just wondering how you created the UCF101 dataset for your experiments? The only version of the dataset I can find is either still in avi form or has only 2 frames. For a fair comparison, I would like to use the same dataset as you, is this available somewhere?
Great work Tarun. Can you share an updated version where one can run FLAVR inferencing and training (using small number of images) on windows as well: e.g., using pycharm etc.
Hi,
I am using pretrained 8x model to interpolate the demo sprite video as shown on the project homepage. But I find that it seems to "pause" per second. Do you know why? Thx!
Hey team ! Congrats for the amazing job
when trying to modify the config file (eg : reducing batch size), it seems to not being affecting the behaviour of the script (interpolate.py) when launched.
Pretty new to learning python so I maybe misundeerstanding something here, but just in case !
best
Lucien
Hello, Thanks for your brilliant work but I have a problem about the finetune. When I finetune your model on my own dataset, the finetuned model predicted twinkled videos and I output the predicted frame, I found that the predicted frame was darker than the adjacent frames. Then I tried train the model from the start using Unet34, but got the similar results that darker. And the PSNR and training loss were improving, but the inference results were worse.
Could you please explain to me a little?
It's the training details
python main.py --batch_size 8 --test_batch_size 8 --dataset vimeo90K_septuplet --loss 1L1 -max_epoch 200 --lr 0.00001 --n_outputs 1
Namespace(batch_size=8, beta1=0.9, beta2=0.99, checkpoint_dir='.', cuda=True, data_root='/vimeo_septuplet', dataset='vimeo90K_septuplet', exp_name='exp'
, joinType='concat', load_from=None, log_iter=60, loss='1L1', lr=1e-05, max_epoch=200, model='unet_18', n_outputs=1, nbr_frame=4, nbr_width=1, num_gpu=1, num_workers=16, pretrained='FLA
VR_2x.pth', random_seed=12345, resume=False, resume_exp=None, start_epoch=0, test_batch_size=8, upmode='transpose', use_tensorboard=False, val_freq=1
So I wanted to run this on a custom set of videos, but I'm unsure on how I'm supposed to setup the data properly to do so. I currently have another folder inside with my videos, and if I run:
python test.py --dataset vimeo90K_septuplet --data_root "vimeo90K_septuplet" --load_from "./FLAVR_2x.pth" --n_outputs 1
where the vimeo90k_septuplet folder just contains my two videos, I get the error
FileNotFoundError: [Errno 2] No such file or directory: 'vimeo90K_septuplet\\sep_trainlist.txt'
I'm unsure on how to setup the text file for this, and potentially more issues for setup, but I'm not quite sure where to go. Any help would be really appreciated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.