dstt's People
dstt's Issues
How to align the consecutive frames or patches?
Nice work! I hope to know whether use alignment, such as optical flow or affine transformation to these image patches?
is the frame the ground truth? but the paper says there is no groundtruth for training
Line 252 in 0b16ff1
FLOPs calculation
@ruiliu-ai Could you please release your FLOPs calculation code? Thanks in advance.
how does this network realize self-training?
the paper uses most of the parts to show the generator network. but because of lacking ground-truth, we need self-training to realize objective moving? so I wonder how to realize self-training? GAN is supervised learning as I know.
tks for answer!
Can you please point out the download links of the dataset you use?
Hi author,
Seems no YouTube-VOS download link from https://competitions.codalab.org/competitions/19544 and too many DAVIS download links from https://davischallenge.org/davis2017/code.html
Can you please give us a concrete hint about it?
About GPU issues
Algorithm output format (mp4, other, etc)
Hi, thanks for the code. Can I modify the output format, or should I transform it post algorithm? (from mp4 to png, for example?)
Question about the inference speed.
Hi, friend.
I got different results compared with the Figure 1 in your paper when I test the inference speed of the models.
For STTN, I got a result about 11 FPS.
Could you tell me how you test it?
How to align the consecutive frames or patches?
Asked for pretrained Discriminator
Could you offer pretrained Discriminator together? I wonder if it can be used for measuring the quality of the inpainted result.
Some question about the inference speed.
I just came into contact with the research direction of video inpainting recently. The test sets of Davis and YouTube-VOS only correspond to one mask for each video. How did you use these data sets to conduct the test?
Some question about the paper!
In your paper Section 3.2, split the F in the s^2 zones, Then total number is t * s^2 * n, why this number need to * n , should the number is t * s^2?
Looking forward your reply
Questions about pos_embedding
Hello,
This is such a great work and thanks for sharing the codes!
I have a question about why there is no pos_embedding in the codes while transformer is not aware of the temporal orders of inputs. Please give me any hints, thanks!
Best,
Kejie
When you will release your code
When you will release your code
Input resolutions other than 432x240
I am trying to test your work with input images of resolution 640x448 but I keep getting the following error:
File "/home/cosmos/AI/DSTT/model/DSTT.py", line 241, in forward
key = key.view(b, t, 2, self.h//2, 2, self.w//2, self.head, c_h)
RuntimeError: shape '[1, 11, 2, 10, 2, 18, 4, 128]' is invalid for input of size 11556864
Is it only possible to use 432x240 input images with the pre-trained model?
If so, would I need to train a new model specifically for the 640x448 resolution?
I also tried to change the resolution in youtube-vos.json and run the train.py script but I get a similar error there.
Can you explain how to make it work for resolutions other than 432x240?
Thank you!
About HierarchyEncoder
Hi, in the paper, it is stated that the interaction between different scale feature maps is isolated by group convolution to preserve the spatial structure. In theory, x0 and out0 should be spliced directly without grouping. However, in the code, the Fj and F1 layer feature maps are grouped before the channel dimension concat. Does this operation lead to the information interaction between different scale feature maps?
def forward(self, x):
bt, c, h, w = x.size()
out = x
for i, layer in enumerate(self.layers):
if i % 2 == 0 and i != 0:
g = self.group[i//2]
x0 = x.view(bt, g, -1, h, w)
out0 = out.view(bt, g, -1, h, w)
out = torch.cat([x0, out0], 2).view(bt, -1, h, w)
out = layer(out)
return out
License
Hey, what is the license of the repo?
Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed
when i run test.py,it return an error , just like this.
python3 test.py -c *** -v *** -m ***
then it return error:
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [222,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
Traceback (most recent call last):
File "test.py", line 162, in
main_worker()
File "test.py", line 135, in main_worker
pred_img = model(masked_imgs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(input, **kwargs)
File "/ProjectRoot/test/Video_inpainting/DSTT-master/model/DSTT.py", line 144, in forward
enc_feat = self.encoder(masked_frames.view(bt, c, h, w))
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Generate Mask for custom video
Hi, thanks for the code!
Any code, technique or framework recommendation to perform the necessary segmentation/mask (like --mask examples/schoolgirls) as input for your algorithm? Thanks in advance!
question about the dataset
sorry for bother u, but i can't download the dataset from google driver.
And I tried download dataset from Onedrive then meet the issue "CSC error".
So i wonder that if you have any other way for us to download the YouTube-VOS dataset. thank u.
代码
代码在哪里啊
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.